Power Tools 1993 November

home *** CD-ROM | disk | FTP | other *** search

/ Power Tools 1993 November - Disc 2 / Power Tools Plus (Disc 2 of 2)(November 1993)(HP).iso / hotlines / gsyhl / rvsr / rvsr.txt < prev next >

Wrap

Text File | 1992-09-01 | 31KB | 698 lines

COMPETITIVE ANALYSIS OF RISC ARCHITECTURES Hewlett-Packard's PA-RISC Advantage versus other Competitive RISC Architectures .PA TABLE OF CONTENTS Executive Summary........................................ 2 The RISC Market.......................................... 4 Architecture versus Implementation....................... 7 Hewlett-Packard's PA-RISC................................ 8 PA-RISC versus Other Competitive RISC Architectures...... 10 Conclusion............................................... 14 Appendix A............................................... 16 Glossary................................................. 18 Table 1: PA-RISC Architecture Feature Comparison........ 20 Table 2: PA-RISC Implementation Comparison............... 21 .PA Executive Summary The purpose of this paper is to compare the differences between Hewlett-Packard's RISC (Reduced Instruction Set Computing) architecture, named Precision Architecture RISC (PA-RISC), and other competitive RISC architectures (Sun's SPARC, Mips RX000, IBM's POWER, and DEC's ALPHA) and to describe the implications of these architectural differences and what benefits they bring to HP's customers. This paper discusses (in the following order): * the RISC market and its rapid growth over the last several years. * the relative importance of an architectural comparison, as opposed to a comparison of implementations. * HP's PA-RISC architecture design goals. * each of the architectures, from a high-level perspective, in a number of key architectural areas. * the overall relative strengths and weaknesses of each architecture. RISC technology offers the potential for dramatic price/performance improvements over computers with traditional CISC (Complex Instruction Set Computing) technology. Although computers based on RISC technology share some common architectural attributes, major design differences exist. These architectural differences translate into performance, flexibility, and growth advantages for the customer who chooses the appropriate family of RISC processors. In summary, HP 9000 Series 800 Business Servers and Series 700 workstations, and HP 3000 Series 900 Business Systems are all based on HP's leading-edge RISC architecture. PA-RISC is a carefully designed computer architecture that provides the following benefits over other competitive RISC architectures: * PA-RISC is the leader in performance. PA-RISC's performance growth has been about 60% per year for the last 5 years. Performance over the next 5 years is expected to grow even faster, closer to 75% per year. * PA-RISC is the most flexible architecture because it is designed for the commercial and technical environment. With PA-RISC, HP offers products from the desktop to mainframes with 100% object-code compatibility, and is built for future needs. * PA-RISC is extensible so it can adapt to technology change. * PA-RISC supports over 4500 applications today. It has thousands of leading edge applications across many industries in the commercial, manufacturing, engineering, and scientific markets. * HP's PA-RISC design requires fewer components, advanced VLSI technology, reduced number of boards for a single system, high reliability, and lower support costs. This combination provides a measurable overall lower cost of computing compared with other competitive systems. As a result, PA-RISC systems are smaller and and take up less floor space, consume less power, and require less cooling than competitive systems. ..picturec:\risc\9000-116.gal,65535,49151,16,65,17, .PA The RISC Market Since the mid-1980s, RISC processors and systems have revolutionized the commercial computer market. This technology has been accepted by all the major computer vendors: HP, IBM, DEC, and Sun. According to the RISC Management Newsletter, RISC-based system shipments have grown from $1.95 billion in 1988 to $18.5 billion in 1991. RISC is not a fad. It is a reality. ..picturec:\risc\riscmkt.gal,65535,49151,16,65,17, Of the leading computer vendors, HP started shipping PA-RISC based computers in 1986 followed by Sun/SPARC in 1987, DEC/Mips in 1989, and IBM/POWER in 1990. ..picturec:\risc\timeline.gal,65535,49151,16,65,17, All the major computer vendors have moved to RISC. The reason is because RISC brings users value. RISC provides higher performance at a dramatically lower cost in terms of $/MIP than CISC architectures. ..picturec:\risc\movrisc.gal,65535,49151,16,65,17, As a result of HP's early commitment to RISC and the viability of PA- RISC in commercial applications, HP is the leading RISC system vendor. According to Infocorp, HP has 31.1% of the 1991 worldwide RISC market share based on revenue dollars of RISC desktop and multiuser systems, Sun has 19.5%, IBM has 13.2%, and DEC has 11.8%. ..picturec:\risc\riscpie1.gal,65535,49151,16,65,17, Also according to Infocorp, HP has 51.3% of the 1991 worldwide multiuser RISC market share based on revenue dollars of RISC multiuser systems, DEC has 12.5%, Sun has 7.8%, and IBM has 5.8%. ..picturec:\risc\riscpie4.gal,65535,49151,16,65,17, .PA Architecture versus Implementation Various proposals for drawing the line between computer architecture and computer implementation exist. The original definition defined the architecture to the instruction set and execution model. All else makes up the implementation. A broader definition sets the architecture as the instruction set and structure down to the functional modules of the system. Various other definitions fall between these 2 extremes. For our purposes, we will use the original definition. In other words, we define the architecture as only software-visible features including the basic instruction set and memory management architectures. It does not include the specification of functional modules used to implement these features. The reasons for evaluating architecture instead of implementation are as follows. Changing an architecture, in general, implies that changes will have to be made to user application software. Since most computer vendors do not write all of their own applications and because of the enormous number of packages that would have to be updated, the cost of such a change is very high. In some cases additions to an architecture could be made in such a manner that existing user application software is both forward- and backward-compatible. In general, however, this is not the case, and the selection of a good architecture is critical. Changes to an implementation imply that only changes to the hardware and possibly the operating system software will be necessary. Since vendors upgrade both the hardware and operating system on a regular basis (to include latest chip implementations), the added cost of changes in an implementation remains small in comparison to the user software changes. Any limitations in a given implementation can be reduced or circumvented in the next implementation. Therefore, the selection of a RISC architecture based on a given implementation is not critical. .PA Hewlett-Packard's PA-RISC The primary RISC design goal is to increase processor efficiency by greatly reducing the average number of cycles expended per instruction. Researchers have found that reducing the complexity of the computer instruction set can lead to dramatic performance improvements, as well as reduced design and manufacturing costs. The result is a computer architecture with price/performance advantages over traditional CISC architectures. HP's PA-RISC is the result of one of the most exhaustive projects undertaken by a computer vendor. The goal was to design an architecture that: * supports the commercial, scientific, and engineering environments. * supports both single-user and multiuser applications. * provides superior price/performance. * supports 64-bit virtual addressing. * supports multiprocessing. * scales across multiple IC design technologies. * provides investment protection (i.e. 100% forward compatible). * provides openness and adheres to standards. * supports various operating environments. This goal was achieved by analyzing billions of instructions executed by application program code acquired from real end users. This code was traced and analyzed with innovative instrumentation and techniques. Starting with a core instruction set, HP scientists strove to strike a balance, while designing an instruction set that would provide superior performance in a variety of applications. As a result, HP's PA-RISC delivers exceptional price/performance to a wide range of applications. Design methods such as pipelining, superscalar, and superpipelining (See Appendix A for details) can be used to improve RISC performance. All 3 methods can, in theory, cover the same range of performance. Superscalar and superpipelining are more complex than simple pipelining. This additional complexity can result in higher costs, slower time-to-market, and lower reliability. Today, superscalar or superpipelining is not needed to deliver leading edge performance. Our current PA-RISC systems are the industry leaders in performance and price/performance without using any of the more complex design methods. HP's next generation chip (PA-RISC 7100--available during the Fall of 1992) uses a simple 2-way superscalar design approach. This approach provides the benefits of superscalar while avoiding its complexities. The new design also integrates the central processing unit and the floating-point unit on a single chip, replacing the 2 chips found in current PA-RISC systems. This will save board space and lower the cost of future PA-RISC-based systems. A new 0.8 micron CMOS technology reduces circuit size and allows HP to design extremely dense chips for increased processor performance and reliability. HP designed PA-RISC to deliver real and measurable benefits. The following are the major benefits: * Investment Protection--You can be assured of a long architectural life with PA-RISC. HP has implemented PA-RISC in high-performance systems based on TTL, NMOS and CMOS technologies. Future implementations currently in our research laboratory include technologies such as BiCMOS or Gallium Arsenide. As IC technologies change, you can be assured your application code will be forward compatible. Furthermore, HP's high-end HP 9000 Series 800 Business Servers and HP 3000 Series 900 Business Systems also offers symmetric multiprocessing (SMP) capabilities today so users can have more performance if they need it. HP's SMP implementation is entirely transparent to the application which means that existing applications can benefit from increased performance without any modifications. * Time and Cost Savings--time to market with RISC systems is faster because they are easier to design and manufacture. PA-RISC delivers true cost savings because HP has eliminated complex processor hardware and reduced the part count making the system much more efficient, thus reducing power consumption and cooling requirements. Another benefit of the reduced part count is that HP's PA-RISC systems take up less floor space. And with less parts, there is less to break, which leads to increased reliability resulting in less downtime, which means time and cost savings. .PA PA-RISC versus Other Competitive RISC Architectures Some significant differences exist between HP's PA-RISC architecture and other competitive RISC architectures. The following comparison will discuss the architectural differences (see Table 1) and their significance. Note that Table 2 lists the implementation differences. 1. Virtual Address Space (Bits): Virtual address space determines the maximum amount of data that can be used by the system at any given time. In 1986, HP became the first company to ship a RISC processor architected for 64-bit virtual addressing, a feature which has been included in all PA-RISC processors. Mips' R4000 and DEC's upcoming ALPHA both offer 64 bit architectures, but only 42 and 43 bits are implemented in the hardware, respectively. IBM offers a 52 bit architecture of which 52 bits are implemented in the hardware. Sun only offers a 33 bit architecture of which 32 bits are implemented in the hardware. A large virtual address space offers flexibility and speed for memory intensive computing applications. For example, mapped files, artificial intelligence, object-oriented databases, and multimedia (e.g. voice and video data) applications require large amounts of virtual address space for optimum performance. Customers will realize and appreciate the flexibility of PA-RISC as applications become more memory intensive. Computer scientists have estimated that virtual address space requirements double every year. According to this prediction, PA-RISC should maintain a high performance level far into the future. 2. Maximum Segment Size (Bits): Related to the virtual address space is segment size. With the PA-RISC and SPARC architectures the segment size is 32 bits. In other words, each user (or process) is an assigned segment (or section of memory) of up to 4 GB (2 to the 32 power). The POWER architecture supports 28 bits. The Mips and ALPHA architectures use an unsegmented 64-bit address space. Using this method, users can be allocated more than 4 GB of space if needed. Today, very few applications need this much space. Even in situations where more than 4 GB is needed, PA-RISC will assign multiple segments to that process, although some performance overhead is required to switch between the segments. .PA 3. General Purpose Registers: On RISC processors, registers are utilized to hold intermediate computational results minimizing slow memory access. HP's extensive simulations have shown that 32 registers is the optimum number. Any greater number of registers reduces performance by increasing CPU cycle time without a compensating decrease in instruction path length. SPARC is based on the idea of overlapping register windows. Currently, SPARC implementations use 120 registers to support 7 windows. At any instant, a program can access only one window, or 32 registers. This approach can present performance problems in multiuser environments. As the number of users on a system increases from a single-user environment, windows-based architectures face an increasing need to save and restore their large number of registers. This intensifying need for register management increases system overhead and reduces the proportion of processor resources that can be allocated to user processes. Also, SPARC's large number of registers require extensive silicon area to implement. This can increase problems for any implementation. Scalability is also a problem due to the difficulties in migrating SPARC to newer, higher performance technologies. A larger than necessary register set also contributes to longer (i.e., slower) cycle time. PA-RISC, POWER, Mips, and ALPHA all use 32 general purpose registers. This requires far less silicon area, allowing room for other functions and features like caching and an enhanced instruction set that contribute more directly to faster application processing. 4. Floating Point and Integer Registers (Bits): There is a growing trend in technical applications toward highly accurate double-precision (64-bit) calculations. PA-RISC, POWER, Mips, and ALPHA use 64-bit floating point registers and can perform double- precision operations very quickly. These registers can store very large numbers and can perform calculations with extremely high accuracy. SPARC has only 32-bit floating point registers. PA-RISC, POWER, and SPARC use 32-bit integer registers, while Mips and ALPHA use 64-bit registers. These 32-bit registers can store values as large as 4,200,000,000. HP has found that customers who work with numbers even larger than this use the floating point unit. Therefore, there is no need to add the extra cost of larger integer registers. The PA7100 chip can perform 64-bit floating point calculations in just 2 clock cycles, so there is little penalty (and often a significant speedup) for doing precise calculations on the floating point side. .PA 5. Binary Coded Decimal (BCD) Support: PA-RISC includes features for efficient BCD support. POWER, Mips, SPARC, and ALPHA offer no similar support. PA-RISC was designed for commercial applications where BCD support is very important. For example, COBOL programs frequently use BCD data in arithmetic calculations. Therefore, support of BCD translates into increased commercial performance for PA-RISC. POWER, Mips, SPARC, and ALPHA's must generate longer instruction path lengths in order to deal with these instructions, which may limit their commercial performance and applicability. 6. Combined Operation Support: PA-RISC can execute a test-and-branch operation with 1 instruction. POWER, Mips, SPARC, and ALPHA execute a test-and-branch operation with 2 instructions. Since 15% to 20% of all instructions encountered in a typical program execution are test-and-branch, performance drops when code is doubled. With PA-RISC, this feature alone delivers a 10% to 15% performance advantage over competitors. Other combined operations (e.g., add-and-branch, load-and-increment, floating-point-add-and- multiply) provide a total of 30% performance advantage over less powerful architectures such as Mips, SPARC, and ALPHA. 7. Unaligned String Support: PA-RISC selectively stores 1 to 4 bytes from a register allowing simple handling of unaligned strings. Unaligned strings consist of sections of stored data that must be transferred to another location in a database and then shifted to the left or right without affecting other bytes within a register. Mips, SPARC, and ALPHA have to combine many different instructions to accomplish this task. PA-RISC's ability to support unaligned strings is important because most strings are short (i.e., partial words). These partial word stores and shifts are common in database and COBOL applications. Therefore, since PA-RISC can store partial words in one machine cycle, the processor has a speed advantage over many other RISC implementations in commercial applications. .PA 8. Memory Protection Levels: Unlike single-user systems, multiuser systems require security from unauthorized access to sensitive files and programs. Protection is necessary to prevent loss or corruption of data. PA-RISC has a hardwired memory protection system which provides 4 privilege levels. PA-RISC offers vertical protection for "public" data across 4 different levels and horizontal protection for "private" data at each level. Calls to subsystems of limited privilege are allowed through "gateways," without passing through the most privileged level. This architected framework provides the foundation necessary for efficient, well secured multiuser systems. POWER, Mips, SPARC, and ALPHA provide a limited protection scheme consisting of 2 privilege levels (i.e. user and supervisor modes). With this scheme, a trap will cause an entry to supervisor mode. There is no other architectural protection. This protection model is so simple that secure multiuser systems will be difficult to implement without considerable software overhead. PA-RISC is one of the only RISC architectures that incorporates this feature. This offers extra assurance that the hardware and software can be accessed by only the appropriate people. .PA Conclusion Although HP's PA-RISC and IBM's POWER, Mips' Mips, Sun's SPARC, and DEC's ALPHA have been developed with RISC concepts, major design differences exist. These architectural differences are due in part to the design approaches utilized by HP, IBM, Mips, Sun, and DEC computer engineers. HP's PA-RISC is a result of a rigorous analysis of real customer code from a variety of application areas. In summary, HP's PA-RISC has been carefully developed to provide the following architectural advantages. * PA-RISC is a flexible architecture providing exceptional performance across a broad range of applications in commercial, engineering, and scientific environments. None of the other RISC vendors can match our capabilities either from a performance perspective or from a flexibility perspective. * PA-RISC's large virtual address space is capable of meeting today's accelerating memory requirements as well as meeting the demands of memory intensive applications such as object-oriented databases, artificial intelligence, and image processing. HP was the first supplier to design a 64 bit architecture. * PA-RISC does not use a flat 64-bit virtual address space because the segmented method is less expensive. PA-RISC uses a cost-effective "segmented" approach to deliver 64-bit functionality. Customers do not pay for 64-bit features they do not need. * PA-RISC's 32 general purpose registers require far less silicon area than SPARC's register window approach, allowing room for other functions and features like caching and an enhanced instruction set that contribute more directly to faster application processing. * PA-RISC directly supports binary-coded decimal (BCD) and string operations frequently used in commercial languages such as COBOL, which leads to higher commercial performance. * PA-RISC's combined operation support provides a single-cycle solution which greatly reduces path lengths (i.e., increases performance). * PA-RISC's comprehensive protection scheme offers a higher degree of security for multiuser systems. .PA ..picturec:\risc\conc.gal,65535,49151,16,65,17, .PA Appendix A--RISC CPU Architectural Capabilities 1. SUPERSCALAR The term superscalar (derived from scalar, meaning "one-dimensional", thus superscalar meaning "multi-dimensional") refers to a design incorporating multiple execution units, each capable of executing an instruction simultaneously. For example, this may be implemented as an integer unit and a floating point unit, each of which can begin an instruction in the same clock cycle. This contrasts with most processors which may have both integer and floating point units but can begin only one instruction at a time. While the ability to perform two or more instructions per cycle can be an advantage, current superscalar processors often have restrictions on the type of instructions that can be accepted in the same clock cycle. These processors can only accept one instruction from column "A" and one from column "B" on each cycle. If a program wishes to execute two instructions from column "A", they must be done one after the other, just like a scalar processor. Thus, the performance of such restricted superscalar processors will vary widely from application to application, depending on the mix of instruction types. These processors also require additional circuits to check for and handle the various combinations of instruction types, and special compilers which rearrange program instructions to create opportunities for parallel execution. 2. PIPELINING AND SUPERPIPELINING The term "pipelining" is widely but not consistently used. Most high- performance CPUs today are "pipelined", which means that they execute instructions in a series of steps (or "stages"). This allows a number of instructions to be overlapped, with each instruction in a different stage. For example, a simple three stage pipeline is shown below. ..picturec:\risc\pipe.gal,65535,49151,17,60,15, The sole benefit of pipelining is to reduce the cycle time of the processor. The cycle time (the inverse of clock frequency) of a processor can be represented as the GATE DELAY X CRITICAL GATES. The gate delay (in nanoseconds) is determined by the IC process, and the critical gate length is the longest sequence of logic gates needed to execute any pipelined stage. For a particular IC process, cycle time can be decreased only by dividing instruction execution into multiple pieces, each requiring a smaller number of gates. The challenge of pipelined designs is that the CPU must look for and resolve situations where one instruction in the pipeline needs information from another instruction which has not yet finished. Depending on which pipeline stages the two instructions are in, these situations are handled by "bypassing" (similar to a handoff in a relay race) or by "stalling" (where one instruction waits for the other to complete). Bypassing requires additional circuitry, while stalling slows down the processor. In addition, the number of possible combinations of stages increases the circuit complexity and schedule delays. Many current CPUs use a 4- or 5-stage pipeline to minimize the number of these interactions. Superpipelining is an extension of pipelining techniques to very long pipelines. In principle, superpipelined CPUs have the potential for shorter cycle times because of the additional stages that they support. However, the increased number of stages creates more potential for interactions which require the added costs of bypassing or performance penalties of stalls. .PA Glossary Architecture--The unique set of machine instructions that provides the conceptual basis of a computer. Cache--Fast memory connected directly to the CPU. Generally, systems with larger caches are usually faster because more of the program information can be kept in the cache rather than in the slower main memory. Clock Cycle--The smallest unit of time used by a processor. This time, based on the clock signal, is used to synchronize the various processor circuits. The actual cycle time varies for different processors; the higher the clock frequency, the shorter the cycle time. COBOL (COmmon Business Oriented Language)--A high-level programming language with similarities to English. COBOL is used primarily for business applications. Compatibility--The ability of software developed on one machine to work on another. Complex Instruction Set Computing (CISC)--An architecture that uses microprogramming and complex instructions. Control Store--A special, high-speed device used to store microinstructions in a microprogrammed architecture. Coprocessor--A special purpose processor that works with the CPU to speed up specialized operations such as floating-point arithmetic and graphics processing. Floating point--An instruction which performs a scientific math calculation. Floating-point applications generally contain a mix of integer and floating-point instructions. Hardwired--A type of computer on which the instruction set is implemented directly on the CPU chip. Implementation--The actual hardware structure of the computer. An architecture can be implemented in several different circuit technologies. Instruction Set--The set of all possible machine instructions which can be understood or executed by the computer. The instruction set defines the computer architecture. Integer--An instruction or application which performs only simple calculations on small numbers. Integer applications contain no floating-point instructions. Integrated Circuit (IC)--A single semiconductor device containing a large number of circuits. The density of circuits on a chip is described in degrees of integration: Large-Scale Integration (LSI) and Very Large-Scale Integration (VLSI), etc. Machine Cycle--The period of time required by a computer to perform the most fundamental operation. Main memory--A device capable of storing information in a binary form. Usually accessed by the processor through a memory interface chip. Optimizing Compiler--A sophisticated compiler that intelligently translates High-Level Language programs by removing inefficiencies and unnecessary instructions. With an optimizing compiler, a program will run faster and use less memory. Pipelining--A design whereby several machine instructions are processed or executed simultaneously. Program--The set of commands that tell a computer what to do. Real Address Space--The group of addresses used by the main memory. The size of this space is limited by the CPU. Reduced Instruction Set Computing (RISC)--An architecture that features a simplified, hardwired instruction set. Registers--Small, high-speed devices within the Execution Unit of the CPU where information is held temporarily. Superpipelining--See Appendix A. Superscalar--See Appendix A. Virtual Address Space--The group of all unique memory addresses assigned to all program data and instructions being used by a system at any time. Virtual addresses are used to keep track of this information as it moves between main memory and disk. The size of this space is limited by the CPU. .PA Table 1: PA-RISC Architecture Feature Comparison FEATURE HP's IBM's Mips' Sun's DEC's PA-RISC POWER RX000 SPARC ALPHA ----------------------------------------------------------------------- Virtual Address Space (Bits) 64 52 64 33 64 Maximum Segment Size (Bits) 32 28 62 32 62 General Purpose Registers 32 32 32 120 32 FP/Integer Registers (Bits) 64/32 64/32 64/64 32/32 64/64 Binary Coded Decimal Support Yes No No No No Combined Operation Support Yes Limited No No No Unaligned String Support Yes Yes No No No Memory Protection Levels 4 2 2 2 2 .PA Table 2: PA-RISC Implementation Comparison FEATURE HP's IBM's Mips' Sun's DEC's PA-RISC POWER R4000 SPARC2 ALPHA (PA7100) (EV-4) ----------------------------------------------------------------------- Clock Frequency (MHz) 100 50 50 40 150-200 Virtual Address Space (Bits) 48 52 42 33 43 Maximum Segment Size (Bits) 32 28 41 32 39 Physical Address (Bits) 32 32 36 36 34 Cache Size Maximum (MB) 3 .064 4 .256 8 TLB Size (Entries) 136 160 64 64 48 Superscalar Yes Yes No No Yes Superpipelined No No Yes No Yes Maximum CPU Performance (SPECmarks/second) >120* 100 70 24 >120** SPECmark/MHz 1.2 2.0 1.4 0.6 0.7 * estimated by HP ** estimated by DEC